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DETAILED ACTION 



Response to Arguments 

1. Applicant's arguments filed 1 1/15/07 have been fully considered but they are not 
persuasive. 

Applicant argues that Stevens et al., do not disclose a context dependent 
acoustic model storage unit storing context dependent acoustic models in a form of sub- 
word state trees (Amendment, pages 5, and 6). 

The examiner disagrees, Stevens et al., teach that the acoustic models represent 
phonemes, wherein each phoneme may be represented as a triphone that includes 
multiple nodes, and a triphone is a context-dependent phoneme (paragraph 78, lines 1, 
and 2; paragraph 75, lines 4 - 6). By representing acoustic models by phonemes, 
wherein each phoneme may be represented as a triphone that includes multiple nodes 
implies using a context dependent acoustic model storage unit storing context 
dependent acoustic models in a form of sub-word state trees, since each phoneme of 
the acoustic models contain multiple nodes that represent a tree structure. 

Applicant argues that Stevens et al., do not teach performing matching between 
feature parameters of inputted speech and the developed hypotheses so as to output 
word information including a word, an accumulated score, and a beginning start frame 
with respect to a hypothesis representing a word end portion (Amendment, pages 7, 
and 8). 
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The examiner disagrees, Stevens et al., teach a recognizer receives and 
processes the frames of an utterance to identify text corresponding to the utterance. 
The recognizer entertains several hypotheses about the text and associates a score 
with each hypothesis. The recognizer determines that a word is ending when the frame 
corresponds to the last component of the model for the word. If the recognizer 
determines that a word is ending, the recognizer sets a flag that indicates that the next 
frame may correspond to the beginning of a word (paragraph 60, lines 1 - 4; paragraph 
93, lines 4 - 9). Processing the frames of an utterance to identify text corresponding to 
the utterance based on several hypotheses about the text and associates a score with 
each hypothesis implies performing matching between feature parameters of inputted 
speech and the developed hypotheses so as to output word information including a 
word, an accumulated score, and a beginning start frame with respect to a hypothesis 
representing a word end portion, since the recognizer can indicate whether or not a 
frame represents a beginning of a word. 

Claim Rejections - 35 USC § 102 

2. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(e) the invention was described in (1) an application for patent, published under section 122(b), by 
another filed in the United States before the invention by the applicant for patent or (2) a patent 
granted on an application for patent by another filed in the United States before the invention by the 
applicant for patent, except that an international application filed under the treaty defined in section 
351(a) shall have the effects for purposes of this subsection of an application filed in the United States 
only if the international application designated the United States and was published under Article 21(2) 
of such treaty in the English language. 
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3. Claims 1,4-6, and 8 are rejected under 35 U.S.C. 102(e) as being anticipated 
by Stevens et al., (US PAP 2002/0138265). 

As per claims 1,6- 8, Stevens et al., teach a continuous speech recognition 
apparatus and method which uses, as a recognition unit, a sub-word determined 
depending on an adjacent sub-word and which uses context dependent acoustic 
models dependent on sub-word context to recognize a continuous input speech, 
comprising: 

a word lexicon in which each of words included in vocabulary is stored in a form 

i 

of a sub-word network or in a sub-word tree structure ("lexicon tree"; paragraph 90); 

a language model storage unit in which language models representing 
information regarding connection between words is stored (vocabulary files contain all 
of the words, and language model information"; paragraph 76); 

a context dependent acoustic model storage unit in which the context dependent 
acoustic models are stored in a form of sub-word state trees in each of which state 
sequences of a plurality of sub-word models of the context dependent acoustic models 
are organized in a tree structure ("each phoneme may be represented as a triphone that 
includes multiple nodes. A triphone is a context-dependent phoneme"; paragraph 75); 

a matching unit developing hypotheses of sub-words by referencing the sub-word 
state tree representing the context dependent acoustic models, the word lexicon and 
the language models, and performing matching between the feature parameters of 
inputted speech and the developed hypotheses so as to output ("the score reflects the 
probability that a hypothesis corresponds to the user's speech"), word information 
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including a word, an accumulated score and a beginning start frame with respect to a 
hypothesis representing a word end portion ("a sequence of phonemes for which the 
matches were sought was the actual sequence of phonemes produced by the speaker"; 
paragraphs 60, and 169); 

and a search unit for searching the word information to generate recognition 
results (paragraphs 22, and 105). 

As per claim 4, Stevens et al., further disclose that when developing the 
hypotheses by referencing the sub-word state tree, the matching unit puts a flag on 
states connectable to each other in the sub-word state trees that represent the 
hypotheses, by using information on connectable sub-words obtained from the word 
lexicon and the language model ("sets a flag"; paragraph 93). 

As per claim 5, Stevens et al., further disclose that during a matching operation, 
the matching unit calculates scores of the developed hypotheses based on the feature 
parameters ("the score reflects the probability that a hypothesis corresponds to the 
user's speech"), and prunes the hypotheses in conformity to criteria including a 
threshold value of the scores or a quantity of hypotheses (Hypothesis could have been 
pruned"; paragraph 60, lines 1 - 6; paragraph 195, lines 13-15). 
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Claim Rejections - 35 USC § 103 

4. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

5. Claims 2, and 3 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Stevens et al., (US PAP 2002/0138265) in view of Chen et al., (US Patent 6,006,186). 

As per claim 2, Stevens et al., further disclose that context dependent acoustic 
models stored in the context dependent acoustic model storage unit are context 
dependent acoustic models in which a center sub-word depends on sub-words 
preceding and succeeding the center sub-word respectively ("the triphone "abc" 
represents the phoneme "b" in the context of the phonemes "a" and V"; paragraph 75). 

However Stevens et al., do not specifically teach the state sequences of sub- 
word models having identical preceding sub-words and identical center sub-words are 
organized in a tree structure. 

Chen et al., teach that a shared phoneme model is generated to represent each 
of the groups of triphone phoneme models for which the number of trained frames 
available in the training library having common biphone, wherein the common biphone 
may comprise either the center context in combination with either right or left context of 
the triphone model (col. 10, lines 24 - 32). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
the invention was made to use a common biphone as taught by Chen et al., in Stevens 
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et al, because that would make the system more efficiency, by retrieving faster the 
spoken words. 

As per claim 3, Chen et al., further disclose that the context dependent acoustic 
models are state sharing models in which a plurality of sub-word models share states 
("shared phoneme models"; col. 10, lines 24 - 32). 

Conclusion 

6. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 

7. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Leonard Saint-Cyr whose telephone number is (571) 
272-4247. The examiner can normally be reached on Mon- Friday. 



Application/Control Number: 10/501,502 



Page 8 



Art Unit: 2626 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone 
number for the organization where this application or proceeding is assigned is (571)- 
273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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